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THE VITERBI AND DIFFERENTIAL 
TRELLIS DECODING ALGORITHMS 


Decoding algorithms based on the trellis representation of a code (block or con- 
volutional) drastically reduce decoding complexity. The best known and most 
commonly used trellis-based decoding algorithm is the Viterbi algorithm [23, 
79, 105]. It is a maximum likelihood decoding algorithm. Convolutional codes 
with the Viterbi decoding have been widely used for error control in digital 
communications over the last two decades. This chapter is concerned with 
the application of the Viterbi decoding algorithm to linear block codes. First, 
the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis 
to minimize the computational complexity of a Viterbi decoder is discussed 
and an algorithm is presented. Some design issues for IC (integrated circuit) 
implementation of a Viterbi decoder are considered and discussed. Finally, a 
new decoding algorithm based on the principle of compare-select-add is pre- 
sented. This new algorithm can be applied to both block and convolutional 
codes and is more efficient than the conventional Viterbi algorithm based on 
the add-compare-select principle. This algorithm is particularly efficient for 
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rate-l/n antipodal convolutional codes and their high-rate punctured codes. 
It reduces computational complexity by one-third compared with the Viterbi 
algorithm. 

10.1 THE VITERBI DECODING ALGORITHM 

The Viterbi algorithm is based on the simple idea that among the paths merging 
into a state in the code trellis, only the most probable path needs to be saved 
for future processing and all the other paths can be eliminated without affecting 
decoding optimality. This elimination of the less probable paths from further 
consideration drastically reduces decoding complexity. The path being saved is 
called the survivor. Therefore, there is a survivor at each state in the trellis 
at every level. The survivors at each level of the code trellis are extended to 
the next level through the composite branches between the two levels. The 
paths that merge into a state at the next level are then compared and the most 
probable path is selected as the survivor. This process continues until the end 
of the trellis is reached. At the end of the trellis, there is only one state, the 
final state a/, and there is only one survivor, which is the most likely codeword. 
Decoding is then completed. 

Viterbi decoding of a linear block code based on a sectionalized trellis dia- 
gram T({h 0 , hi, , hi}), with section boundary locations 0 = h Q < hi < • * • < 
hi = N, is carried out serially, section by section, from the initial state <To to 
the final state 07. Suppose the decoder has processed j trellis sections up to 
time-hj. There are survivors, one for each state in E/ 4 j (C). These 

survivors together with their path (or state) metrics are stored in memory. 
To process the (; 4- l)-th section, the decoder executes the following steps: 

(1) Each survivor is extended through the composite branches diverging 
from it to the next state level at time-/i J + 1 . 

(2) For each composite branch into a state in E/i >+1 (C), find the single 
branch with the largest (correlation) metric. The metric computed is 
the branch metric. 

(3) Replace each composite branch by the branch with the largest metric. 
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(4) Add the metric of a branch to the metric of the survivor from which the 
branch diverges. For each state a in Ej» >+ ,(C), compare the metrics of 
the paths converging into it and select the path with the largest path 
metric as the survivor terminating at state cr. This step is called the 
add-compare-select (ACS) procedure in the Viterbi algorithm. 

The decoder executes the above steps repeatedly, section by section, until it 
reaches the final state a f . At this point, there is one and only one survivor, 
which is the decoded codeword and the most likely codeword. The information 
bits corresponding to this decoded codeword are then delivered to the user. 
The decoding window is simply the code length. 

Using the above decoding algorithm, the total number of operations (addi- 
tions and comparisons) can be computed easily. This number can be reduced 
significantly if sectionalization of a trellis is done properly [60], This will be 
discussed in the next section. 

10.2 OPTIMUM SECTIONALIZATION OF A CODE TRELLIS: 
LAFOURCADE-VARDY ALGORITHM 

In decoding a block code with the Viterbi algorithm, the total number of com- 
putations depends on the sectionalization of the trellis diagram for the code. A 
sectionalization of a code trellis for a code C that gives the smallest total num- 
ber of computations is called an optimum sectionalization for C. An optimum 
sectionalization is not necessarily unique. In the following, an algorithm for 
finding an optimum sectionalization is presented. This algorithm was devised 
by Lafourcade and Vardy [60]. 

The Lafourcade- Vardy (LV) algorithm is based on the following simple fact: 

(F) For any integers x and y with 0 < x < y < N, a section from 
time-x to time-y in any sectionalized trellis T(U) with x,y G U and 
x + l,x + 2,...,y — 1 & U is identical. 

Let <fi(x,y) denote the number of computations required in steps (1) to (4) of 
the Viterbi algorithm to process the trellis section from time-x to time-y in any 
sectionalized code trellis T(U) with x,y 6 U and x + l,x + 2, . . . y — 1 £ U. 
It follows from the above simple fact (F) that <p{x, y) is determined only by x 
and y. Let yj min (a:, y) denote the smallest number of computations of steps (1) 
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Table 10.1. Optimum sectionalization*. 


Code 

Optimum Sectionalization U 

Complexity 

Complexity 

N-section 

RM li# 

{0,4,8,12,16,24,32,40,48, 

806 

2,825 


52,56,60,63,64} 



RM 2 ,6 

{0,8,16, 32,48,56,61,63,64} 

101,786 

425,209 

RM3,6 

{0,8,16,24,32,40,48,56,64} 

538,799 

773,881 


to (4) to process the trellis section(s) from time-x to time-y in any sectionalized 
code trellis T[U) with x,y e U. The value, vw(0,JV), gives the total number 
of computations of the Viterbi algorithm for the code trellis with an optimum 
sectionalization. Then, it follows from (F) and the definitions of ^(x.y) and 
that 

. J min < ¥>(0,y), min {¥>,„in(0,x) + y?(x,y)} [ , for 1 < y < N, 

V?tnin(0 ,y) = < l °< x <y ) 

lv(0,l), for y = 1. 

( 10 . 1 ) 

We can compute v’minCO.y) for every y with 0 < y < N efficiently in the 
following way: The values of <p(x,y) for 0 < x < y < N are computed using 
the structure of the trellis section from time-x to time-y. First, the value of 
V’miniO, 1) is computed. For an integer y with 1 < y < N, V’minlOjy) can 
be computed from ¥> mm (0,x) and vj(x,y) with 0 < x < y. By storing the 
information when the minimum value occurs in the right-hand side of (10.1), 
an optimum sectionalization is found from the computation of N). 


Example 10.1 Table 10.1 gives the optimum sectionalizations for three RM 
codes of length 64 using the LV algorithm. 
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10.3 SOME DESIGN ISSUES FOR 1C IMPLEMENTATION OF 
VITERBI DECODERS FOR LINEAR BLOCK CODES 

Theoretically, any linear block code can be decoded by applying the Viterbi 
algorithm to a trellis for the code. However, practical limitations preclude 
the application of this algorithm to many good codes with long block lengths. 
The main reasons are the increases in state complexity, state connectivity, and 
branch complexity of the trellises for good block codes as the length of the 
codes increases. Much of the research on maximum likelihood decoding of lin- 
ear block codes with the Viterbi algorithm over a code trellis has focussed on 
the minimization of the number of computations required for decoding a re- 
ceived sequence. If the actual decoding is intended to be performed using a 
stored program approach (a software implementation) that executes the oper- 
ations needed to decode a received sequence sequentially, then this approach 
will lead to the fastest decoding speed. However, if an IC (hardware) imple- 
mentation is intended, then many other factors besides the number of decoding 
computations must be considered. We must consider the factors that affect the 
circuit requirements, wire-routing within an IC chip, chip size, circuit utiliza- 
tion, power consumption, ACS computation speed, and other implementation 
issues. As a result, an alternate approach that is more suitable for IC imple- 
mentation is desired. 

For IC implementation of a Viterbi decoder for a linear block code, besides 
the state and branch complexities, other important trellis structural properties 
that should be included in the design considerations are state connectivity, 
the parallel structure, regularity, and symmetry. Proper use of these structural 
properties may result in a simpler decoding circuit and a higher decoding speed. 

Optimum sectionalization in terms of minimizing the computational com- 
plexity, in general, results in a non- uniformly sectionalized trellis diagram. In 
a Viterbi decoder, quantities such as the branch labels, survivor path metrics, 
and survivor path labels generally reside in word registers, which are basically 
an ordered sequence of bit registers. The same hardware is used to process all 
trellis sections. If a register must store a particular variable, such as a branch 
metric or a state metric, it must be designed to accommodate the largest value 
of the variable over all trellis sections. Since the section lengths for a non- 
uniformly sectionalized trellis vary from one section to another, the registers 
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involved must be designed based on the longest section. This may increase the 
relative complexity of an IC Viterbi decoder. Therefore, for IC implementation 
of a Viterbi decoder, a uniformly sectionalized trellis is more desirable. 

Although a minimal trellis reduces the state and branch complexities, the 
states are densely connected. For long codes, this dense connection between 
the states causes serious wire- routing (interconnection) problems within an IC 
chip for hardware implementation of a Viterbi decoder and requires a large area 
of the chip (or a multilayer chip) to accommodate the decoding circuit. Fur- 
thermore, interconnections increase internal communications between various 
parts of the decoding circuit, which slow down the decoding speed and increase 
power consumption. Let p max (C) be the maximum state space dimension of a 
minimal trellis for a code C. Then the number of registers required to store the 
survivor paths and their metrics must be 2 Pm *** c *. If a separate ACS circuit 
is required for processing each state at each trellis level, then ACS 

circuits are needed. If the differences between PrnAx(C') an< * the state space 
dimensions at many section boundary locations are large, then many of the 
registers and ACS circuits are not used during the decoding process. This re- 
sults in poor hardware utilization efficiency. All the above problems may be 
solved or partially solved by using a non-minimal trellis with a proper parallel 
decomposition, as discussed in Chapter 7. Regularity among the trellis sections 
also helps to overcome the above problems and reduces decoding complexity. 
Symmetry structure, such as mirror symmetry, allows bidirectional decoding, 
which speeds up the decoding process. Therefore, for hardware implementation 
of a Viterbi decoder for a linear block code, a non-minimal trellis may result 
in a simpler and faster decoding circuit with a higher hardware utilization effi- 
ciency. In design, both minimal and non-minimal trellises should be considered 
and the one that results in a simpler circuit and a higher decoding speed should 
be used. 

In the following, we examine some key factors that affect the decoding com- 
plexity and speed of a Viterbi decoder based on a minimal or non-minimal trel- 
lis. The non-minimal trellis structure presented in Section 7.1 reduces internal 
communications and allows independent parallel processing of the subtrellises, 
while decreasing the complexity of an IC Viterbi decoder. It has significant 
advantages over the minimal trellis for IC implementation of a Viterbi decoder. 
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10.3.1 Hardware Utilization Efficiency and Effective Computation^ 
Complexity 

Consider an IC Viterbi decoder based on an L- section trellis for a linear (iV, K) 
block code C with section boundary locations at h Q = 0, hi, h 2 , . . - , h L = N. 
While many VLSI structures have been described for a Viterbi decoder [10, 
38, 100], the most widely implemented structure is based on the ACS-array 
architecture, wherein each abstract state in the trellis manifests itself as a 
physical ACS circuit on the IC, and the same ACS circuits are repeatedly 
used for all levels in the trellis. The ACS circuits can be labeled ACS-/ for 
1 < / < 2 pL ' mMM l c \ where pL, m * x {C) is the maximum state space dimension 
of the L-section minimal trellis for C. We assume that pt.max(C) is fixed no 
matter whether a minimal trellis or a non-minimal trellis is used in the decoder 
design. 

The ACS circuits work as follows. At time-0, the metrics of the ACS circuits 
corresponding to the originating states of each parallel subtrellis are initialized 
to 0. At time-h^ the ACS-/ corresponding to state <r {l) € 'E hl {C) at the 
end of section- 1 of the trellis, for 1 < / < |E^(C)|, has the metric of state 
The index of the surviving branch into state <r^ is stored in ACS-/. 
Continuing in this way, at time-h t , for 1 < i < L, ACS-/ corresponding to state 

0 e Y> h IC) will have the metric for cr li) and a sequence of i survivor branch 
indices corresponding to the most likely path from the initial state to c 

Whenever the decoder is processing the trellis at a level at which the size 
of the state space is smaller than 2 PL -~" iC \ a number of ACS circuits will be 
idle. If the number of inactive ACS circuits is large and occurs often during 
the decoding process, the hardware utilization efficiency becomes poor. For 
example, consider the minimal 8-section trellis for the (64, 42) R\1 code, RM3.6- 
This trellis has a state space dimension profile (0,7, 10, 13, 10, 13, 10, 7,0) with 
P 9 .m»x{C) = 13. For a Viterbi decoder designed based on this trellis, at time- 
h x and -h 7 , there are 2 13 - 2 7 = 8,064 inactive ACS circuits. At time-/i 2 , -h* 
and -h 6 , there are 2 13 - 2 10 = 7,168 inactive ACS circuits. Only at time-h 3 
and -h 5 , all the ACS circuits are active. We see that the hardware utilization 
efficiency is very poor for a Viterbi decoder for the RM 3 ,« code based on the 
minimal 8-section trellis using the ACS-array architecture. 
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Hardware utilization efficiency can be improved by a proper parallel decom- 
position of a minimal trellis into parallel isomorphic subtrellises. The decom- 
position results in a non-minimal trellis with the same maximum state space 
dimension pl, max(C)- Therefore, the number of ACS circuits in the ACS-array 
is still the same, but the number of active ACS circuits is increased at many, if 
not all, section boundary locations. We illustrate this with an example. Using 
the method presented in Section 7.1, the minimal 8-section trellis for the (64, 42) 
RM code, RMj^, can be decomposed into a non-minimal 8-section trellis with 
128 parallel isomorphic subtrellises, each having a state space dimension pro- 
file (0,6, 6, 6, 3, 6, 6, 6,0). Therefore, the state space dimension profile for the 
overall trellis is (0, 13, 13, 13, 10, 13, 13, 13,0). We see that the maximum state 
space dimension is still ps.max(^) = 13. However, for a decoder based on this 
non-minimal trellis, all 8,192 ACS circuits are active all the time, except at 
time-/i 4 . This greatly improves the hardware utilization efficiency. 

For a trellis (minimal or non-minimal) that consists of parallel subtrellises, all 
the subtrellis decoders operate independently in parallel without communica- 
tion between them. From the standpoint of speed, the effective computational 
complexity of decoding a received sequence is defined as the computational 
complexity of a single parallel subtrellis (viz. the minimal trellis for a sub- 
code C') plus the cost of the final comparison among the survivors presented 
by each of the subtrellis decoders. The time required for final comparison is 
generally small relative to the time required for processing a subtrellis and this 
comparison can be pipelined. Since all the subtrellises are processed in parallel, 
the speed of decoding is therefore limited only by the time required to process 
one subtrellis. If a minimal trellis does not have enough parallel structure and 
decoding speed is critical, parallel decomposition can be used to reduce the 
effective computational complexity and thus to gain speed. 

10.3.2 Complexity of the ACS Circuit 

The converging branch dimension profile (CBDP), (<5 i,<$2, • . . , <5l), defined in 
Section 6.2 also affects decoding speed and complexity. Each component Si 
is the base-2 logarithm of the number of composite branches converging into 
a state at a particular level of the trellis. The number 2*' is called a radix 
number in the IC literature. At level-i of the trellis, each ACS circuit has to 
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perform <5; stages of a tree-type two-way comparison to find the best incoming 
branch. Hence, a reduction in the values of the components in the CBDP of 
a trellis will improve decoding speed and reduce the complexity of each ACS 
circuit. If the radix numbers in a minimal trellis are too large, then parallel 
decomposition can be used to achieve smaller radix numbers, and hence to 
reduce the complexity of each ACS circuit and to increase the decoding speed. 

10.3.3 Traceback Complexity 

Even though the branch and state metrics are computed and updated in the 
Viterbi decoder at every level of the trellis, the best (or most likely) path 
through the trellis must be determined. The process of determining the best 
path is called traceback in the literature. 

Recall that the number of parallel branches in a composite branch in the 
t-th section of an L-section trellis for C with section boundary locations in 
{h 0 ,h u ... ,h L ) is 

i /'itr __ -,k ) 

For a Viterbi decoder based on the minimal trellis, the ACS-1 corresponding to 
state <7 (,) € Ea.(C) for 1 < i < L must store <5,(C) + k[C ht _ x . h ,) bits in order 
to identify which of the 2 *- {C) composite branches converging into state <t (,) is 
chosen and which of the 2 fc(C ' , '-‘ s ‘ ) parallel branches survives. Therefore, each 
ACS-1 needs to store 

j2(6i(C) + KC h ._ l . h .)) = K ( 10 . 2 ) 

»=1 

bits in order to identify the sequence of survived incoming branches and to 
determine the decoded path. If this number is too large, parallel decomposi- 
tion can be used to reduce it. Consider a non-minimal trellis with 2 ! parallel 
subtrellises obtained by parallel decomposition of the minimal L-section trellis 
for C based on a subcode C\ If a Viterbi decoder is designed based on this 
non-minimal trellis, then the number of bits that must be stored for each ACS-/ 
is 

4- i.^i!,)) = dim(C ). 

i—i 


(10.3) 
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Since dim(C') = K - q < K, the ACS circuits based on this non-minimal trellis 
design require less storage than for the design based on the minimal trellis. The 
total savings in storage in all the ACS circuits are 

2"— -dim(C')). (10.4) 

This is a significant savings. 

10.3.4 A CS- Conn ec ti vi ty 

The hardware implementation of a Viterbi decoder is severely affected by the 
physical placement of the ACS circuits and the need to route information be- 
tween them. The routing complexity should be minimized in a Viterbi decoder 
IC design in order to reduce the size of the IC chip. 

The basic operations performed by an ACS circuit are: addition of the branch 
metrics of the incoming branches to the state metrics of the corresponding orig- 
inating states* comparison of the resulting sums to find the best one, selection 
of the surviving sum as the new state metric and the corresponding surviving 
branch label. The ACS-array architecture is usually dominated by the area 
required by the interconnections to transfer the state metrics. For a state 
a {l) € S h% {C) with 1 < / < |E^,(C)| and 0 < i < L, let (?,(cr (i) ) denote the set 
of states in Eh l+l (C) that are adjacent to cr {l) . For l > |£/ l( (C)|, = 0. 

Then in the ACS-array implementation of a Viterbi decoder, paths to transfer 
the state metrics exist between ACS-/ and all the ACS circuits that correspond 
to the states in 

<?o(<T (i) ) U Qi(<7 (,) ) U • • • U Q(I.-1)(<7 (, »). (10.5) 

The above set, denoted Q defines the connectivity of ACS -1 in the ACS-array 
corresponding to state a (l> . We call |Q (,) | and q (l> = log 2 |<? (,) | the connectivity 
and connectivity dimension of the ACS-/, respectively. The connectivities of 
ACS circuits determine the areas on an IC chip needed for wiring [10, 38]. This 
area should be kept as small as possible. 

The ACS-connectivity can be reduced by using a non-minimal trellis with a 
proper number of parallel isomorphic subtrellises. With such a trellis, the ACS 
circuits can be divided into blocks such that the ACS circuits corresponding to 
states in a single subtrellis form a block. A particular ACS circuit only needs 
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to transfer its metric to a subset of ACS circuits within its own block. This 
greatly reduces the ACS-connectivity and hence the hardware complexity and 
the wiring area on the IC chip. 

10.3.5 Branch Complexity 

The decoding speed of a Viterbi decoder depends on the total number of 
branches in the trellis to be processed and how fast they are being processed. 

If the processing load is shared by many ACS circuits at any time instant, then 
each ACS circuit will carry a small amount of processing load. This will speed 
up the decoding process. Therefore, a more meaningful measure of branch com- 
plexity is the number of branches to be processed by an ACS circuit [73, 101]. 

As pointed out earlier in this section, the number of active ACS circuits can 
be increased by parallel decomposition of a minimal trellis. However, parallel 
decomposition, in general, results in an increase in the number of composite 
branches in a trellis section. If the rate of increase of active ACS circuits is larger 
than the increase rate of composite branches, then the number of branches to 
be processed by each ACS circuit will decrease. The processing load of an 
ACS circuit at time-h, is determined by the number of composite branches 
diverging from its corresponding state in Eh.(C’). Therefore the total number 
of branches to be processed by an ACS circuit is determined by the diverging 
branch dimension profile (DBDP) of the trellis being used in the design. 

Consider the minimal 8-section trellis of the (64,42) RM code, RM3.6- The 
state space dimension profile of this code is (0,7,10,13,10,13,10,7,0) and its 
DBDP is (7,6,6, 3, 6, 3, 3,0). Consider section-2 of the trellis. The number 
of composite branches in this section is 2 13 . However, the number of active 
ACS circuits corresponding to the states of the trellis at the end of section-1 
is 2 7 = 128. Since each state has 64 composite branches diverging from it, 
each active ACS circuit must process 64 composite branches. Now consider 
the parallel decomposition of this minimal trellis into 128 parallel isomorphic 
subtrellises. The resultant non-minimal trellis has a state space dimension 
profile (0,13,13,13,10,13,13,13,0) and each subtrellis has a state space di- 
mension profile (0,6, 6, 6, 3, 6, 6, 6,0). The DBDP of this non-minimal trellis 
is (13,3,3,3,6,3,3,0). All the components of this DBDP, except for the first 
one, are smaller than (or equal to) the corresponding components of the DBDP 
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for the minimal 8-section trellis of this code. Consider section-2 of this non- 
minimal trellis. The total number of composite branches is now 2 18 , a large 
increase from 2 13 for the minimal trellis. However, all the 2 13 ACS circuits 
at time- hi are active and they share the processing load. Each ACS circuit 
processes only 8 composite branches, compared to 64 for the minimal trellis. 

It is the same at the other time instants, except at time-/i 4 , where each active 
ACS circuit needs to process 64 branches, the same as for the minimal trellis. 
Therefore, the number of operations performed per ACS circuit is smaller for 
a Viterbi decoder designed based on the above non-minimal trellis. Reducing 
the diverging branch profile also results in a reduction of ACS-connectivity and 
hence a reduction in implementation complexity and wiring area on an IC chip. 

Based on the above analysis and discussions, we may conclude that in design- 
ing a hardware Viterbi decoder for a specific linear block code, if the minimal 
trellis for the code is not desirable, then a non-minimal trellis with proper 
structural properties should be considered. 

10.4 DIFFERENTIAL TRELLIS DECODING 

The Viterbi algorithm was first devised for decoding convolutional codes. This 
decoding algorithm is based on the simple principle of add-compare-select 
(ACS) to process the code trellis and eliminate the less probable paths at each 
trellis level. This simple ACS principle has been used for implementing Viterbi 
decoders over the last two decades. However, a trellis-based decoding algo- 
rithm for convolutional codes can be devised based on a different processing 
principle, namely compare-select-add (CSA). This decoding algorithm is 
devised based on a specific partition of a trellis section and the CSA processing 
principle. It is more efficient than the conventional Viterbi decoding algorithm. 

This decoding algorithm is called the differential trellis decoding (DTD) 
algorithm [32]. 

Consider a rate-l/n (n, I,m) convolutional code of memory order m. The 
encoder of this code has one input and n outputs. Let a = (ao, ai, . . . , a;, . . .) 
be the input information sequence. The n corresponding output code sequences 


u 


(i) _ 




( 1 ) 


are 
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u 


( 2 ) 


= («o 2) -« 


( 2 ) 

1 * 


’“I 


( 2 ) 




At time-i, the input to the encoder is a* and the output of the encoder is a 
block of n code bits (u[ 1) ,u[ J) , . . . ,u‘ n) ). The trellis for this code consists of 
2 m states with two branches entering and leaving each state at any time (or 
level) greater than m. 

A rate-l/n (n, l,m) convolutional code is said to be antipodal if, in the 
generator matrix of (9.7), Go = G m = [11... 1). Most of the best rate-l/n 
convolutional codes are antipodal. For an antipodal convolutional code, the two 
branches entering (or leaving) a state in its code trellis are one’s complement 
to each other, i.e., if one branch is labeled with (u* l \u[ 2> ,...,i4 ’), then the 

other branch is labeled with (1 © uj 1 ’, 1 © uj 2) 1 © u!"*), where © denotes 

the modulo-2 addition. 

At time-i, the state of the encoder is defined by and labeled with the infor- 
mation bits (a,_ lt a,_ 2 , . . . stored in the input shift register. Consider 

the trellis section from time-i to time-(i + 1) for i > m. This section can 
be partitioned into 2 m_1 two-state fully connected subtrellises with the 
following structural properties: (1) the two states at time-i are labeled with 
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. ,a t _ m+1> 0) and (a*. 1 , 0 ^ 2 , • • « 1 , 1), respectively; (2) the 
two states at time-(* -{- 1) are labeled with ( 0 ,ai_i,ai_ 2 i • • • , a,_ m+ i) and (1, 
ai_i,ai_ 2 , . • respectively; (3) the branches connecting the state 

(di-i, fli-2> • ■ * m+i»0) to the states (0,Ot_i,ai_2» • • • ^nd (l,Oi-i, 

ai_2> * * • » Gt-m+i) are labeled with the code blocks . . . ,uj n *) and 

(1 0 1 0 t*j 2 \. ..,10 ii[ n *), respectively; and (4) the branches connect- 

ing state (at.i,Oi.2f-« ,a»-m+ii 1) to the states (0 > ai« 1 ,a i _ 2 , . . . ,a*_ m +i) 
and (l,ai_i,aj_ 2 , . . . ,a*_ m +i) are labeled with the code blocks (1 0 u\ l \ 1 0 
u| 2 \ . ..,10 u* nJ ) and ( 11 ^, 11 ^, . . . , respectively. The structure of such 
a two-state subtrellis is depicted in Figure 10.1. These 2 m_l fully connected 
subtrellises are commonly called “butterflies” . 

Based on the above state grouping and trellis partitioning between time- 
i and time-(i + 1), each subtrellis can be labeled by an (m - l)-tuple a = 
(ai_i,a*_ 2 ,... t ai-m+i)* In each subtrellis-a, the states at time-i and the 
states at time-(i + 1) are represented by (a,ai_ m ) and (a,, a), respectively, 
with a;_ m ,a t E {0, 1}. 

The decoding algorithm to be presented in the following is based on the 
above trellis partition. Assume that BPSK is used for transmission and each 
BPSK signal has unit energy. A code sequence is mapped into a bipolar signal 
sequence for transmission. The i-th code block (u- 1 \u[ 2 \...,u|^) is mapped 
into the following bipolar sequence: 

(2U* 1 ’ - 1,2u< 2) - l,...,2u[" ) - 1). (10.6) 

Suppose correlation is used as the decoding metric. Let r; = (rj 1 *.^ 2 *, . . . , rj n *) 

be the received block in the interval between time-i and time-(i + 1). It follows 
from properties (3), (4) of a butterfly subtrellis given above, that the four 
branch metrics between time-i and time-(i + 1) in subtrellis-a take two opposite 
values ±N£+ l% with 

N? +l =E(2«4 2) -lH 2) . (10-7) 

;'=i 

Let Afi(a,0) and M,(a, 1) denote the cumulative correlation metrics that 
have survived at time-i for states (a,0) and (a, 1), respectively. Define 


A<+i(0* 1) — Mi(a,0) — Mi(a, 1) 


( 10 . 8 ) 
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as the difference between these two metrics computed at time-(i +1). Then 
at time-(i + 1), the difference between the cumulative metric candidates cor- 
responding to transitions from states (a,0) and (a, 1) to state (ai,a) is given 
by 

DT+M = A? +1 (0, 1) - 2(2 ai - 1 )N? +l (10.9) 

for ai 6 {0, 1}. 

Note that MLD maximizes the correlation metric. Hence, from (10.9), we 
conclude that at state (a*, a) of subtrellis-a, we select the branch diverging 
from state (a,0) if A? +1 (0, 1) > 2(2a< - 1 )N° +l , and the branch diverging from 
state (a, 1) otherwise. Therefore, this decision can be made by first determining 

|Ma,w| = max{|Af +1 (0, 1)|, 12^1}, (10.10) 

and then checking the sign of the value M^./v corresponding to this maximum, 
denoted sgn(Af&,jv). Based on the comparison result given in (10.10) and 
sgn(Ma./v), the selection of the surviving branches into states (a,, a) with 
a, € {0, 1} is made. All the four selections of surviving branches are shown in 
Figure 10.2. The selection rules are given below: 

(1) If |Af +1 (0, 1)| > |2AT“ j | and A“ +1 (0,1) > 0, the two branches diverging 
from state (a, 0) into states (0, a) and (1, a) are selected as the surviving 
branches. 

(2) If |Af +1 (0, 1)| > |2Ar“. x | and A? +1 (0, 1) < 0, the two branches diverging 
from state (a, 1) into states (0, a) and (1, a) are selected as the surviving 
branches. 

(3) If |A? +1 (0, 1)| < |2A'“. 1 | and 2/^ > 0, the branch diverging from state 
(q, 0) into state (0,a) and the branch diverging from state (a,l) into 
state (l,a) are selected as the surviving branches. 

(4) If |A? +1 (0, 1)| < |2A r ‘l 1 | and 2N-^. l < 0, the branch diverging from state 
(a, 0) into state (l,a) and the branch diverging from state (a,l) into 
state (0, a) are selected as the surviving branches. 

For each subtrellis-a, the decoding process from time-i to time-(i + 1) can 
be carried out as follows: 
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.If |A? +1 (0,l)|>|2Aft 1 |: 




•If >0: 


•Else 



Figure 10.2. 


Branch selections for subtreliis-cr. 
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Step- 1 Compute the four possible branch metrics ±N? +l (preprocessing) and 
scale them by 2. 

Step-2 From Step-1, identify 2 N? +1 and compute the metric difference 

A? +1 (0,1). 

Step-3 Compare |A“ +1 (0, 1)| with |2fV, ; + 1 |. 

Step-4 Based on the comparison result of Step-3, determine either 
sgn(Af +1 (0,l)) or sgn(A r ^. i ), and select the surviving branches based 
on the selection rule shown in in Figure 10.2. 

Step-5 For each state at time-(i + 1), update the new survivor metric based 
on Step-4. 

The above decoding algorithm is called the differential CSA-algorithm. The 
metric computations in Step-1, which are also performed by the conventional 
Viterbi algorithm, can be preprocessed since at most 2 n_1 values must be 
computed. Also, if the branch from state (a, a,_ m ) to state (a;, a) survives, 
the surviving metric at state (a„ a) in Step-5 can be computed as follows: 

M,+i(ai,a) = Mi(at,ai- m ) + (2 ai-m - l)(2ai - 1)W,“ r (10.11) 

Note that the scaling by 2 of the preprocessed values ±N° +1 at Step-1 and 
the sign checks at Step-4 are elementary binary operations (scaling is done by 
shifting the register once). The real number operations are performed at Step-2, 
Step-3, and Step-5. There are 2 m_1 subtractions at Step-2, 2 m_1 comparisons 
at Step-3 and 2 m additions at Step-5. Therefore a total of 2-2 m real number 
operations is required to process a trellis section. However, after Step-1, 
the conventional Viterbi algorithm requires 2 m+1 additions to evaluate the 
cumulative metrics for 2 m states and 2 m comparisons to determine the 2 m 
survivors. This results in a total of 3 2 m real number operations to process 
a trellis section. As a result, the the differential CSA-algorithm requires about 
1/3 less real number operations than the conventional Viterbi algorithm for 
rate-l/n antipodal convolutional codes as well as high-rate punctured codes 
obtained from them. 
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Example 10.2 Consider the (2,1,6) convolutional code with generating pat 
tern 


[110 111110 0 10 11 ], 

which is the most commonly used convolutional code. This code is antipodal. 
Its trellis consists of 64 states and can be decomposed into 32 fully connected 
2-state subtrellises as shown in Figure 10.1. Since n = 2, at time-i, there are 
four possible branch metrics of the form ±(r ^ \ ± r^)* which are computed 
with two real additions, and then scaled by 2. For this code, at time-t, the 
Viterbi algorithm computes 128 cumulative metric candidates and then per- 
forms 64 comparisons, so a total of 192 real value operations is required. The 
differential CSA-algorithm first computes 32 metric differences at Step-2, and 
then performs 32 comparisons at Step-3. Finally, based on the 32 sign checks 
of Step-4, 64 surviving cumulative metrics are updated at Step-5. As a result, 
only 128 real value additions are executed. Therefore, 64 real value operations 
are saved by the differential CSA-algorithm at the expense of 32 sign checks 
and 2 scalings by 2. 

In practical applications, high-rate convolutional codes are often constructed 
from a low-rate (n, l,m) convolutional code by puncturing. The trellis for the 
punctured code has the same structure and state complexity as that of the 
original rate-l/n convolutional code, except that the lengths of its sections vary 
periodically. As a result, the decoder for the rate-l/n convolutional code can be 
used for decoding the punctured code. If the base rate-l/n convolutional code is 
antipodal, then any punctured code constructed from it is also antipodal. Each 
trellis section for the punctured code can be partitioned into 2 m-1 butterfly 
subtrellises in exactly the same manner as described above. The two branches 
leaving (or entering) a state in a butterfly subtrellis are one’s complement 
of each other. Consequently, the differential CSA-algorithm can be used for 
decoding the punctured code. All the rate-fc/(/c -|- 1) punctured convolutional 
codes presented in [16] are time- varying antipodal codes. Also, this construction 
can be generalized to the case where k rate-l/n base convolutional codes rather 
than only one are periodically selected, with period k . Again, if the resulting 
time- varying punctured code is antipodal, then the differential CSA-algorithm 
can be used. 
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The application of the differential CSA-algorithm to rat e-fc/n convolutional 
codes with k > 1 also allows 1/3 real value computation saving after proper 
pairing of the states in the code trellis [32]. The differential CSA-algorithm 
can also be applied efficiently to trellis decoding of block codes. For example, 
trellis decoding based on the 4-section trellis diagram for the (16,5) RM code 
requires 59 real value operations for the differential CSA-algorithm and 95 real 
value operations for the conventional Viterbi algorithm. 




